10 research outputs found
Hyperbolic Face Anti-Spoofing
Learning generalized face anti-spoofing (FAS) models against presentation
attacks is essential for the security of face recognition systems. Previous FAS
methods usually encourage models to extract discriminative features, of which
the distances within the same class (bonafide or attack) are pushed close while
those between bonafide and attack are pulled away. However, these methods are
designed based on Euclidean distance, which lacks generalization ability for
unseen attack detection due to poor hierarchy embedding ability. According to
the evidence that different spoofing attacks are intrinsically hierarchical, we
propose to learn richer hierarchical and discriminative spoofing cues in
hyperbolic space. Specifically, for unimodal FAS learning, the feature
embeddings are projected into the Poincar\'e ball, and then the hyperbolic
binary logistic regression layer is cascaded for classification. To further
improve generalization, we conduct hyperbolic contrastive learning for the
bonafide only while relaxing the constraints on diverse spoofing attacks. To
alleviate the vanishing gradient problem in hyperbolic space, a new feature
clipping method is proposed to enhance the training stability of hyperbolic
models. Besides, we further design a multimodal FAS framework with Euclidean
multimodal feature decomposition and hyperbolic multimodal feature fusion &
classification. Extensive experiments on three benchmark datasets (i.e., WMCA,
PADISI-Face, and SiW-M) with diverse attack types demonstrate that the proposed
method can bring significant improvement compared to the Euclidean baselines on
unseen attack detection. In addition, the proposed framework is also
generalized well on four benchmark datasets (i.e., MSU-MFSD, IDIAP
REPLAY-ATTACK, CASIA-FASD, and OULU-NPU) with a limited number of attack types
Forgery-aware Adaptive Vision Transformer for Face Forgery Detection
With the advancement in face manipulation technologies, the importance of
face forgery detection in protecting authentication integrity becomes
increasingly evident. Previous Vision Transformer (ViT)-based detectors have
demonstrated subpar performance in cross-database evaluations, primarily
because fully fine-tuning with limited Deepfake data often leads to forgetting
pre-trained knowledge and over-fitting to data-specific ones. To circumvent
these issues, we propose a novel Forgery-aware Adaptive Vision Transformer
(FA-ViT). In FA-ViT, the vanilla ViT's parameters are frozen to preserve its
pre-trained knowledge, while two specially designed components, the Local-aware
Forgery Injector (LFI) and the Global-aware Forgery Adaptor (GFA), are employed
to adapt forgery-related knowledge. our proposed FA-ViT effectively combines
these two different types of knowledge to form the general forgery features for
detecting Deepfakes. Specifically, LFI captures local discriminative
information and incorporates these information into ViT via
Neighborhood-Preserving Cross Attention (NPCA). Simultaneously, GFA learns
adaptive knowledge in the self-attention layer, bridging the gap between the
two different domain. Furthermore, we design a novel Single Domain Pairwise
Learning (SDPL) to facilitate fine-grained information learning in FA-ViT. The
extensive experiments demonstrate that our FA-ViT achieves state-of-the-art
performance in cross-dataset evaluation and cross-manipulation scenarios, and
improves the robustness against unseen perturbations
S-Adapter: Generalizing Vision Transformer for Face Anti-Spoofing with Statistical Tokens
Face Anti-Spoofing (FAS) aims to detect malicious attempts to invade a face
recognition system by presenting spoofed faces. State-of-the-art FAS techniques
predominantly rely on deep learning models but their cross-domain
generalization capabilities are often hindered by the domain shift problem,
which arises due to different distributions between training and testing data.
In this study, we develop a generalized FAS method under the Efficient
Parameter Transfer Learning (EPTL) paradigm, where we adapt the pre-trained
Vision Transformer models for the FAS task. During training, the adapter
modules are inserted into the pre-trained ViT model, and the adapters are
updated while other pre-trained parameters remain fixed. We find the
limitations of previous vanilla adapters in that they are based on linear
layers, which lack a spoofing-aware inductive bias and thus restrict the
cross-domain generalization. To address this limitation and achieve
cross-domain generalized FAS, we propose a novel Statistical Adapter
(S-Adapter) that gathers local discriminative and statistical information from
localized token histograms. To further improve the generalization of the
statistical tokens, we propose a novel Token Style Regularization (TSR), which
aims to reduce domain style variance by regularizing Gram matrices extracted
from tokens across different domains. Our experimental results demonstrate that
our proposed S-Adapter and TSR provide significant benefits in both zero-shot
and few-shot cross-domain testing, outperforming state-of-the-art methods on
several benchmark tests. We will release the source code upon acceptance
Towards more efficient security inspection via deep learning: a task-driven x-ray image cropping scheme
X-ray imaging machines are widely used in border control checkpoints or public transportation, for luggage scanning and inspection. Recent advances in deep learning enabled automatic object detection of X-ray imaging results to largely reduce labor costs. Compared to tasks on natural images, object detection for X-ray inspection are typically more challenging, due to the varied sizes and aspect ratios of X-ray images, random locations of the small target objects within the redundant background region, etc. In practice, we show that directly applying off-the-shelf deep learning-based detection algorithms for X-ray imagery can be highly time-consuming and ineffective. To this end, we propose a Task-Driven Cropping scheme, dubbed TDC, for improving the deep image detection algorithms towards efficient and effective luggage inspection via X-ray images. Instead of processing the whole X-ray images for object detection, we propose a two-stage strategy, which first adaptively crops X-ray images and only preserves the task-related regions, i.e., the luggage regions for security inspection. A task-specific deep feature extractor is used to rapidly identify the importance of each X-ray image pixel. Only the regions that are useful and related to the detection tasks are kept and passed to the follow-up deep detector. The varied-scale X-ray images are thus reduced to the same size and aspect ratio, which enables a more efficient deep detection pipeline. Besides, to benchmark the effectiveness of X-ray image detection algorithms, we propose a novel dataset for X-ray image detection, dubbed SIXray-D, based on the popular SIXray dataset. In SIXray-D, we provide the complete and more accurate annotations of both object classes and bounding boxes, which enables model training for supervised X-ray detection methods. Our results show that our proposed TDC algorithm can effectively boost popular detection algorithms, by achieving better detection mAPs or reducing the run time.Published versio
Learning meta pattern for face anti-spoofing
Face Anti-Spoofing (FAS) is essential to secure face recognition systems and has been extensively studied in recent years. Although deep neural networks (DNNs) for the FAS task have achieved promising results in intra-dataset experiments with similar distributions of training and testing data, the DNNs' generalization ability is limited under the cross-domain scenarios with different distributions of training and testing data. To improve the generalization ability, recent hybrid methods have been explored to extract task-aware handcrafted features (e.g., Local Binary Pattern) as discriminative information for the input of DNNs. However, the handcrafted feature extraction relies on experts' domain knowledge, and how to choose appropriate handcrafted features is underexplored. To this end, we propose a learnable network to extract Meta Pattern (MP) in our learning-to-learn framework. By replacing handcrafted features with the MP, the discriminative information from MP is capable of learning a more generalized model. Moreover, we devise a two-stream network to hierarchically fuse the input RGB image and the extracted MP by using our proposed Hierarchical Fusion Module (HFM). We conduct comprehensive experiments and show that our MP outperforms the compared handcrafted features. Also, our proposed method with HFM and the MP can achieve state-of-the-art performance on two different domain generalization evaluation benchmarks.Nanyang Technological UniversityThis work was supported in part by the Rapid-Rich Object Search (ROSE) Laboratory, Nanyang Technological University, in part by Nanyang Technological University (NTU)– Peking University (PKU) Joint Research Institute (a collaboration between the NTU and PKU that is sponsored by a donation from the Ng Teng Fong Charitable Foundation), in part by the Science and Technology Foundation of Guangzhou Huangpu Development District under Grant 2019GH16, and in part by the China-Singapore International Joint Research Institute under Grant 206-A018001. The work of Haoliang Li was supported by the CityU New Research Initiatives/Infrastructure Support from Central under Grant APRC 9610528
One-class knowledge distillation for face presentation attack detection
Face presentation attack detection (PAD) has been extensively studied by research communities to enhance the security of face recognition systems. Although existing methods have achieved good performance on testing data with similar distribution as the training data, their performance degrades severely in application scenarios with data of unseen distributions. In situations where the training and testing data are drawn from different domains, a typical approach is to apply domain adaptation techniques to improve face PAD performance with the help of target domain data. However, it has always been a non-trivial challenge to collect sufficient data samples in the target domain, especially for attack samples. This paper introduces a teacher-student framework to improve the cross-domain performance of face PAD with one-class domain adaptation. In addition to the source domain data, the framework utilizes only a few genuine face samples of the target domain. Under this framework, a teacher network is trained with source domain samples to provide discriminative feature representations for face PAD. Student networks are trained to mimic the teacher network and learn similar representations for genuine face samples of the target domain. In the test phase, the similarity score between the representations of the teacher and student networks is used to distinguish attacks from genuine ones. To evaluate the proposed framework under one-class domain adaptation settings, we devised two new protocols and conducted extensive experiments. The experimental results show that our method outperforms baselines under one-class domain adaptation settings and even state-of-the-art methods with unsupervised domain adaptation.Nanyang Technological UniversityNational Research Foundation (NRF)Submitted/Accepted versionThis work was supported in part by the Nanyang Technological University (NTU)-Peking University (PKU) Joint Research Institute (a collaboration between Nanyang Technological University and Peking University that is sponsored by a donation from the Ng Teng Fong Charitable Foundation); in part by the Science and Technology Foundation of Guangzhou Huangpu Development District under Grant 2019GH16; in part by the China-Singapore International Joint Research Institute under Grant 206-A018001; and in part by the National Research Foundation, Prime Minister’s Office, Singapore, under its Strategic Capability Research Centres Funding Initiative. The work of Haoliang Li was supported by the City University of Hong Kong (CityU) New Research Initiatives/Infrastructure Support from Central under Grant APRC 9610528